Shift-And Approach to Pattern Matching in LZW Compressed Text
نویسندگان
چکیده
This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |Σ|) time and O(|Σ|) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, where n is the compressed text length, m is the pattern length, and r is the number of the pattern occurrences. Experimental results show that it runs approximately 1.5 times faster than a decompression followed by a simple search using the Shift-And algorithm. Moreover, the algorithm can be extended to the generalized pattern matching, to the pattern matching with k mismatches, and to the multiple pattern matching, like the Shift-And algorithm.
منابع مشابه
Tying up the loose ends in fully LZW-compressed pattern matching
We consider a natural generalization of the classical pattern matching problem: given compressed representations of a pattern p[1. . M ] and a text t[1. . N ] of sizes m and n, respectively, does p occur in t? We develop an optimal linear time solution for the case when p and t are compressed using the LZW method. This improves the previously known O((n + m) log(n + m)) time solution of G asien...
متن کاملA Unifying Framework for Compressed Pattern Matching
We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW), byte-...
متن کاملBoyer - Moore String Matching over Ziv -
We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The key idea is that, despite that we cannot exactly choose which text characters to inspect, we can still use the characters explicitly represented in those formats to shift the pattern in the text. We present a basic approach and more advanced ones. Despite that the theoretical average complexity does not ...
متن کاملAlmost Optimal Fully LZW-Compressed Pattern Matching
Given two strings: pattern P and text T of lengths jPj =M and jT j = N . A string matching problem is to nd all occurrences of pattern P in text T . A fully compressed string matching problem is the string matching problem with input strings P and T given in compressed forms p and t respectively, where jpj = m and jtj = n. We present rst, almost optimal, string matching algorithms for LZW-compr...
متن کاملApproximate String Matching over Ziv - LempelCompressed
We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, speciically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn + R) ti...
متن کامل